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Abstract 

This paper presents a method to combine 
a set of unsupervised algorithms that can 
accurately disambiguate word senses in a 
large, completely untagged corpus. Al- 
though most of the techniques for word 
sense resolution have been presented as 
stand-alone, it is our belief that full-fledged 
lexical ambiguity resolution should com- 
bine several information sources and tech- 



iqucs. The set of techniques have been 



applied in a combined way to disambiguate 
the genus terms of two machine-readable 
dictionaries (MRD), enabling us to con- 
struct complete taxonomies for Spanish 
and French. Tested accuracy is above 80% 
overall and 95% for two-way ambiguous 
genus terms, showing that taxonomy build- 
ing is not limited to structured dictionaries 
such as LDOCE. 



1 Introduction 

While in English the "lexical bottleneck" problem 
(Briscoe, 1991) seems to be softened (e.g. WordNet 
(Miller, 1990|), Alvey Lexicon (prover et al., 1993D, 



COMLEX ( prishman et al., 1994| ), etc.) there are 
no available wide range lexicons for natural language 
processing (NLP) for other languages. Manual con- 
struction of lexicons is the most reliable technique 
for obtaining structured lexicons but is costly and 
highly time-consuming. This is the reason for many 
researchers having focused on the massive acquisi- 
tion of lexical knowledge and semantic information 
from pre-existing structured lexical resources as au- 
tomatically as possible. 
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As dictionaries are special texts whose subject 
matter is a language (or a pair of languages in the 
case of bilingual dictionaries) they provide a wide 
range of information about words by giving defini- 
tions of senses of words, and, doing that, supplying 
knowledge not just about language, but about the 
world itself. 

One of the most important relation to be ex- 
tracted from machine-readable dictionaries (MRD) 
is the hyponym/hypernym relation among dictio- 
nary senses (e.g. ( [Amsler, 1981 ), (Vossen and Serail 



1990| ) ) not only because of its own importance as the 



backbone of taxonomies, but also because this rela- 
tion acts as the support of main inheritance mecha- 
nisms helping, thus, the acquisition of other relation s 



and semantic features ( Cohen and Loiselle, 



providing form al structure and avoid ing redundancy 
in the lexicon ( Briscoe et al., 1990 ). For instance, 
following the natural chain of dictionary senses de- 
scribed in the Diccionario General Ilustrado de la 



Lengua Espanola ( DGILE, 1987 ) we can discover 



that a bonsai is a cultivated plant or bush. 

bonsai_l_2 planta y arbusto asi cultivado. 

(bonsai, plant and bush cultivated in that way) 

The hyponym/hypernym relation appears be- 
tween the entry word (e.g. bonsai) and the genus 
term, or the core of the phrase (e.g. planta and 
arbusto). Thus, usually a dictionary definition is 
written to employ a genus term combined with dif- 
ferentia which distinguishes the word being defined 
from other words with the same genus term|^. 

As lexical ambiguity pervades language in texts, 
the words used in dictionary are themselves lexically 
ambiguous. Thus, when constructing complete dis- 
ambiguated taxonomies, the correct dictionary sense 
of the genus term must be selected in each dictionary 
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For other kind of definition patterns not based on 
genus, a genus-like term was added after studying those 
patterns. 





DGILE 


LPPL 




overall 


nouns 


overall 


nouns 


headwords 


93,484 


53,799 


15,953 


10,506 


senses 


168,779 


93,275 


22,899 


13,740 


total number 










of words 


1,227,380 


903,163 


97,778 


66,323 


average length 
of definition 


7.26 


9.68 


3.27 


3.82 



Table 1: Dictionary Data 

definition, performing what is usually called Word 
Sense Disambiguation (WSD)|^. In the previous ex- 
ample planta has thirteen senses and arbusto only 
one. 

Although a large set of dictionaries have been ex- 
ploited as lexical resources, the most widely used 
monolingual MRD for NLP is LDOCE which was 
designed for learners of English. It is clear that dif- 
ferent dictionaries do not contain the same explicit 
information. The information placed in LDOCE has 
allowed to extract other implicit information easily. 



e.g. taxonomies (Bruce et al., 1992). Does it mean 



that only highly structured dictionaries like LDOCE 
are suitable to be exploited to provide lexical re- 
sources for NLP systems? 

We explored this question probing two disparate 
dictionaries: Diccionario General Ilustrado de la 



Lengua Espanola (DGILE, 1987) for Spanish, and 



Le Plus Petit Larousse (LPPL, 198C) for French 



Both are substantially poorer in coded information 
than LDOCE ( |LDOCE, l"987| )P|. These dictionaries 



are very different in number of headwords, polysemy 
degree, size and length of definitions (c.f. table 1). 
While DGILE is a good example of a large sized 
dictionary, LPPL shows to what extent the smallest 
dictionary is useful. 

Even if most of the techniques for WSD are pre- 
sented a s stand-alone, it is our belief, following the 
ideas of ( McRoy, 1992 ) , that full-fledged lexical am- 
biguity resolution should combine several informa- 
tion sources and techniques. This work does not ad- 
dress all the heuristics cited in her paper, but prof- 
its from techniques that were at hand, without any 
claim of them being complete. In fact we use unsu- 
pervised techniques, i.e. those that do not require 
hand-coding of any kind, that draw knowledge from 
a variety of sources - the source dictionaries, bilin- 
gual dictionaries and WordNet - in diverse ways. 



^ Called also Lexical Ambiguity Resolution, Word 
Sense Discrimination, Word Sense Selection or Word 
Sense Identification. 

^ In LDOCE, dictionary senses are explicitly ordered 
by frequency, 86% dictionary senses have semantic codes 
and 44% of dictionary senses have pragmatic codes. 



This paper tries to proof that using an appropriate 
method to combine those heuristics we can disam- 
biguate the genus terms with reasonable precision, 
and thus construct complete taxonomies from any 
conventional dictionary in any language. 

This paper is organized as follows. After this short 
introduction, section 2 shows the methods we have 
applied. Section 3 describes the test sets and shows 
the results. Section 4 explains the construction of 
the lexical knowledge resources used. Section 5 dis- 
cusses previous work, and finally, section 6 faces 
some conclusions and comments on future work. 

2 Heuristics for Genus Sense 
Disambiguation 

As the methods described in this paper have been 
developed for being applied in a combined way, each 
one must be seen as a container of some part of the 
knowledge (or heuristic) needed to disambiguate the 
correct hypernym sense. Not all the heuristics are 
suitable to be applied to all definitions. For combin- 
ing the heuristics, each heuristic assigns each candi- 
date hypernym sense a normalized weight, i.e. a real 
number ranging from to 1 (after a scaling process, 
where maximum score is assigned 1, c.f. section 2.9). 
The heuristics applied range from the simplest (e.g. 
heuristic 1, 2, 3 and 4) to the most informed ones 
(e.g. heuristics 5, 6, 7 and 8), and use information 
present in the entries under study (e.g. heuristics 1, 
2, 3 and 4) or extracted from the whole dictionary as 
a unique lexical knowledge resource (e.g. heuristics 
5 and 6) or combining lexical knowledge from sev- 
eral heterogeneous lexical resources (e.g. heuristic 7 
and 8). 

2.1 Heuristic 1: Monosemous Genus Term 

This heuristic is applied when the genus term is 
monosemous. As there is only one hypernym sense 
candidate, the hyponym sense is attached to it. Only 
12% of noun dictionary senses have monosemous 
genus terms in DGILE, whereas the smaller LPPL 
reaches 40%. 

2.2 Heuristic 2: Entry Sense Ordering 

This heuristic assumes that senses are ordered in an 
entry by frequency of usage. That is, the most used 
and important senses are placed in the entry before 
less frequent or less important ones. This heuristic 
provides the maximum score to the first sense of the 
hypernym candidates and decreasing scores to the 
others. 



2.3 Heuristic 3: Explicit Semantic Domain 

This heuristic assigns the maximum score to the hy- 
pernym sense which has the same semantic domain 
tag as the hyponym. This heuristic is of limited ap- 
pUcation: LPPL lacks semantic tags, and less than 
10% of the definitions in DGILE are marked with 
one of the 96 different semantic domain tags (e.g. 
med. for medicine, or der. for law, etc.). 

2.4 Heuristic 4: Word Matching 

This heuristic trusts that related concepts will be 
expressed using the same content words. Given 
two definitions - that of the hyponym and that of 
one candidate hypernym - this heuristic computes 
the total amount of content words shared (including 
headwords). Due to the morphological productivity 
of Spanish and French, we have considered differ- 
ent variants of this heuristic. For LPPL the match 
among lemmas proved most useful, while DGILE 
yielded better results when matching the first four 
characters of words. 

2.5 Heuristic 5: Simple Cooccurrence 

This heuristic uses cooccurrence data collected from 
the whole dictionary (see section 4.1 for more de- 
tails). Thus, given a hyponym definition (O) and a 
set of candidate hypernym definitions, this method 
selects the candidate hypernym definition (E) which 
returns the maximum score given by formula (1): 



SC{0,E) 



cw{wi 



(1) 



The cooccurrence weight (cw) between two words 
can be given by Cooccurrence Frequency, Mutual 
Information (Church and Hanks, 199C) or Associ- 
ation Ratio (Resnik, 1995). We tested them us- 
ing different context window sizes. Best results were 
obtained in both dictionaries using the Association 
Ratio. In DGILE window size 7 proved the most 
suitable, whereas in LPPL whole definitions were 
used. 

2.6 Heuristic 6: Cooccurrence Vectors 

This heuristic is bas ed on the method presented in 
(Wilks et al., 1993) which also uses cooccurrence 



data collected from the whole dictionary (c.f. sec- 
tion 4.1). Given a hyponym definition (O) and a set 
of candidate hypernym definitions, this method se- 
lects the candidate hypernym (E) which returns the 
maximum score following formula (2): 



The similarity {sim) between two definitions can 
be measured by the dot product, the cosine function 
or the Euclidean distance between two vectors (Vq 
and Ve) which represent the contexts of the words 
presented in the respective definitions following for- 
mula (3): 



^Def = ^ civ{Wi) 



(3) 



The vector for a definition (Voe/) is computed 
adding the cooccurrence information vectors of the 
words in the definition {civ{wi)). The cooccur- 
rence information vector for a word is collected from 
the whole dictionary using Cooccurrence Frequency, 
Mutual Information or Association Ratio. The best 
combination for each dictionary vary: whereas the 
dot product. Association Ratio, and window size 7 
proved best for DGILE, the cosine. Mutual Informa- 
tion and whole definitions were preferred for LPPL. 

2.7 Heuristic 7: Semantic Vectors 

Because both LPPL and DGILE are poorly seman- 
tically coded we decided to enrich the dictionary as- 
signing automatically a semantic tag to each dictio- 
nary sense (see section 4.2 for more details). Instead 
of assigning only one tag we can attach to each dic- 
tionary sense a vector with weights for each of the 
25 semantic tags we considered (which correspond 



to the 25 lexicographer files of WordNet (Miller 



1990)). In this case, given an hyponym (O) and a 



set of possible hypernyms we select the candidate hy- 
pernym (E) which yields maximum similarity among 
semantic vectors: 



SV{0,E) = sim{Vo,VE) 



(4) 



where sim can be the dot product, cosine or Eu- 
clidean Distance, as before. Each dictionary sense 
has been semantically tagged with a vector of se- 
mantic weights following formula (5). 



Voef ^ swv{wi) 



(5) 



CViO,E) = simiVo,VE) 



(2) 



The salient word vector (swv) for a word contains 
a saliency weight ( Yarowsky, 1992) for each of the 25 
semantic tags of WordNet. Again, the best method 
differs from one dictionary to the other: each one 
prefers the method used in the previous section. 

2.8 Heuristic 8: Conceptual Distance 

Conceptual distance provides a basis for determining 
closeness in meaning among words, taking as refer- 
ence a structured hierarchical net. Conceptual dis- 
tance between two concepts is essentially the length 



of the shortest path that connects the concepts in 
the hierarchy. In order to apply conceptual distance, 
WordNet was chosen as the hierarchical knowledge 
base, and bilingual dictionaries were used to link 
Spanish and French words to the English concepts. 

Given a hyponym definition (O) and a set of candi- 
date hypernym definitions, this heuristic chooses the 
hypernym definition (E) which is closest according 
to the following formula: 





DGILE 


LPPL 


Test Sampling 


391 


115 


Correct Genus Selected 


382 (98%) 


111 (97%) 


Monosemous 


61 (16%) 


40 (36%) 


Senses per genus 


2.75 


2.29 


idem (polysemous only) 


3.64 


3.02 


Correct senses per genus 


1.38 


1.05 


idem (polysemous only) 


1.51 


1.06 



Table 2: Test Sets 



CD{0,E) — dist{headwor do, genus e) (6) 

That is, Conceptual Distance is measured between 
the headword of the hyponym definition and the 
genus of the candidate hypernym definitions using 
formula (7), c.f. (Agirre et al., 1994). To compute 
the distance between any two words (lui, 1^2)7 all the 
corresponding concepts in WordNet (ci- , C2^) are 
searched via a bilingual dictionary, and the mini- 
mum of the summatory for each concept in the path 
between each possible combination of ci.and is 
returned, as shown below: 



3 Evaluation 
3.1 Test Set 

In order to test the performance of each heuristic and 
their combination, we selected two test sets at ran- 
dom (one per dictionary): 391 noun senses for DG- 
ILE and 115 noun senses for LPPL, which give confi- 
dence rates of 95% and 91% respectively. From these 
samples, we retained only those for which the au- 
tomatic selection process selected the correct genus 
(more than 97% in both dictionaries). Both test sets 
were disambiguated by hand. Where necessary mul- 
tiple correct senses were allowed in both dictionaries. 
Table 2 shows the data for the test sets. 



dist(wi,W2) 



mm 



E 

path{ci ■ ,t 



depth(ck) 



(7) 

Formulas (6) and (7) proved the most suitable 
of several other possibilities for this task, includ- 
ing those which included full definitions in (6) or 
those using other Conceptual Distance formulas, c.f. 



(Agirre and Rigau, 1996) 



2.9 Combining the heuristics: Summing 

As outlined in the beginning of this section, the way 
to combine all the heuristics in one single decision 
is simple. The weights each heuristic assigns to the 
rivaling senses of one genus are normalized to the 
interval between 1 (best weight) and 0. Formula (8) 
shows the normalized value a given heuristic will give 
to sense E of the genus, according to the weight as- 
signed to the heuristic to sense E and the maximum 
weight of all the sense of the genus Ei . 



vote{0,E) = 



weight(0, E) 



maxEi {weigth{0, Ei)) 



(8) 



The values thus collected from each heuristic, are 
added up for each competing sense. The order in 
which the heuristics are applied has no relevance at 
all. 



3.2 Results 

Table 3 summarizes the results for polysemous 
genus. 

In general, the results obtained for each heuristic 
seem to be poor, but always over the random choice 
baseline (also shown in tables 3 and 4). The best 
heuristics according to the recall in both dictionaries 
is the sense ordering heuristic (2). For the rest, the 
difference in size of the dictionaries could explain the 
reason why cooccurrence-based heuristics (5 and 6) 
are the best for DGILE, and the worst for LPPL. 
Semantic distance gives the best precision for LPPL, 
but chooses an average of 1.25 senses for each genus. 

With the combination of the heuristics (Sum) 
we obtained an improvement over sense ordering 
(heuristic 2) of 9% (from 70% to 79%) in DGILE, 
and of 7% (from 66% to 73%) in LPPL, maintaining 
in both cases a coverage of 100%. Including monose- 
mous genus in the results (c.f. table 4), the sum 
is able to correctly disambiguate 83% of the genus 
in DGILE (8% improvement over sense ordering) 
and 82% of the genus in LPPL (4% improvement). 
Note that we are adding the results of eight different 
heuristics with eight different performances, improv- 
ing the individual performance of each one. 

In order to test the contribution of each heuris- 
tic to the total knowledge, we tested the sum of all 
the heuristics, eliminating one of them in turn. The 
results are provided in table 5. 



LPPL 


random 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


Sum 


recall 


36% 




66% 




8% 


11% 


22% 


11% 


50% 


73% 


precision 


36% 




66% 




66% 


44% 


61% 


57% 


76% 


73% 


coverage 


100% 




100% 




12% 


25% 


36% 


19% 


66% 


100% 


DGILE 


recall 


30% 




70% 


1% 


44% 


57% 


60% 


57% 


47% 


79% 


precision 


30% 




70% 


100% 


72% 


57% 


60% 


58% 


49% 


79% 


coverage 


100% 




100% 


1% 


61% 


100% 


100% 


99% 


95% 


100% 



Table 3: Results for polysemous genus. 



LPPL 


random 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


Sum 


recall 


59% 


35% 


78% 




40% 


42% 


50% 


42% 


68% 


82% 


precision 


59% 


100% 


78% 




93% 


82% 


84% 


88% 


87% 


82% 


coverage 


100% 


35% 


100% 




43% 


51% 


59% 


48% 


78% 


100% 


DGILE 


recall 


41% 


16% 


75% 


2% 


41% 


59% 


63% 


59% 


48% 


83% 


precision 


41% 


100% 


75% 


100% 


79% 


65% 


66% 


63% 


57% 


83% 


coverage 


100% 


16% 


100% 


2% 


56% 


95% 


97% 


94% 


89% 


100% 



Table 4: Overall results. 



LPPL 


Sum 


-(1) 


-(2) 


-(3) 


-(4) 


-(5) 


-(6) 


-(7) 


-(8) 


recall 


82% 


73% 


74% 




73% 


76% 


77% 


77% 


78% 


precision 


82% 


73% 


75% 




73% 


76% 


77% 


77% 


78% 


coverage 


100% 


100% 


99% 




100% 


100% 


100% 


100% 


100% 


DGILE 


recall 


83% 


79% 


72% 


81% 


81% 


81% 


81% 


81% 


77% 


precision 


83% 


79% 


72% 


82% 


81% 


81% 


81% 


81% 


77% 


coverage 


100% 


100% 


100% 


98% 


100% 


100% 


100% 


100% 


100% 



Table 5: Knowledge provided by each heuristic (overall results). 



( Gale et al., 1993| ) estimate that any sense- 
identification system that does not give the cor- 
rect sense of polysemous words more than 75% of 
the time would not be worth serious consideration. 
As table 5 shows this is not the case in our sys- 
tem. For instance, in DGILE heuristic 8 has the 
worst performance (see table 4, precision 57%), but 
it has the second larger contribution (see table 5, 
precision decreases from 83% to 77%). That is, 
even those heuristics with poor performance can con- 
tribute with knowledge that other heuristics do not 
provide. 

3.3 Evaluation 

The difference in performance between the two dic- 
tionaries show that quality and size of resources is 
a key issue. Apparently the task of disambiguating 
LPPL seems easier: less polysemy, more monose- 
nious genus and high precision of the sense order- 
ing heuristic. However, the heuristics that depend 
only on the size of the data (5, 6) perform poorly on 
LPPL, while they are powerful methods for DGILE. 

The results show that the combination of heuris- 
tics is useful, even if the performance of some of the 
heuristics is low. The combination performs better 



than isolated heuristics, and allows to disambiguate 
all the genus of the test set with a success rate of 
83% in DGILE and 82% in LPPL. 

All the heuristics except heuristic 3 can readily be 
applied to any other dictionary. Minimal parameter 
adjustment (window size, cooccurrence weigth for- 
mula and vector similarity function) should be done 
to fit the characteristics of the dictionary, but ac- 
cording to our results it does not alter significantly 
the results after combining the heuristics. 

4 Derived Lexical Knowledge 
Resources 

4.1 Cooccurrence Data 



Following ( Wilks et al., 1993 ) two words cooccur 
if they appear in the same definition (word order in 
definitions are not taken into account). For instance, 
for DGILE, a lexicon of 300,062 cooccurrence pairs 
among 40,193 word forms was derived (stop words 
were not taken into account). Table 6 shows the first 
eleven words out of the 360 which cooccur with vino 
(wine) ordered by Association Ratio. From left to 
right. Association Ratio and number of occurrences. 
The lexicon (or machine-tractable dictionary. 



AR 



11.1655 



10.0162 



9.6627 



8.6633 



8.1051 



7.2127 



6.9338 



6.8436 



6.6221 



b.450b 



#oc. 



15 



23 



14 



_LL 



12 



12 



Unto (red) 



beber (to drink) 



mosto (must) 



jerei; (sherry) 



cubas (cask, barrel) 
licor (liquor) 



beMda (drink) 



(grape) 



trago (drink, swig) 
sabor (taste) 



pan bread j 



5 Comparison with Previous Work 

Several approaches have been proposed for attaching 
the correct sense (from a set of prescribed ones) of a 
word in context. Some of them have been fully tested 



in real size texts (e.g. statistical methods ( Yarowsky, 



1992| ), dYarowsky, 1994D , ( |Miller and Teibel 
knowledge based methods (Sussna, 1993), 



and Rigau, 1996), or mixed methods (Richardson 
et al., 1994|), ([Resnik, 1995|)). The performance 



1991D 
( Agirre 



Table 6: 
(wine). — 



Example of association ratio for vino 



of WSD is reaching a high stance, although usually 
only small sets of words with clear sense distinctions 
are selected for disambiguation (e.g. (Yarowsky. 



1995) reports a success rate of 96% disambiguating 



MTD) thus produced from the dictionary is used 
by heuristics 5 and 6. 

4.2 Multilingual Data 

Heuristics 7 and 8 need external knowledge, not 
present in the dictionaries themselves. This knowl- 
edge is composed of semantic field tags and hier- 
archical structures, and both were extracted from 
WordNct. In order to do this, the gap between our 
working languages and English was filled with two 
bilingual dictionaries. For this purpose, we derived 
a list of links for each word in Spanish and French 
as follows. 

Firstly, each Spanish or French word was looked 
up in the bilingual dictionary, and its English trans- 
lation was found. For each translation WordNet 



twelve words with two clear sense distinctions each 
one). 

This paper has presented a general technique 
for WSD which is a combination of statistical and 
knowledge based methods, and which has been ap- 
plied to disambiguate all the genus terms in two dic- 
tionaries. 

Although this latter task could be seen easier than 
general WSdQ genus are usually frequent and gen- 
eral words with high ambiguity]^ While the average 
of senses per noun in DGILE is 1.8 the average of 
senses per noun genus is 2.75 (1.30 and 2.29 respec- 
tively for LPPL). Furthermore, it is not possible to 
apply the powerful "one sense per discourse" prop- 



erty (Yarowsky, 1995) because there is no discourse 
in dictionaries. 

WSD is a very difficult task even for humans^, 
but semiautomatic techniques to disambiguate genus 
have been bro a dly used ( Amsler, 198l[) ( Vosscn and 



yielded its senses, in the form of WordNet concepts Serail, 199C) (Ageno et al., 1992) (Artola, 1993) 



(synsets). The pair made of the original word and 
each of the concepts linked to it, was included in a 
file, thus producing a MTD with links between Span- 
ish or French words and WordNet concepts. Obvi- 
ously some of this links are not correct, as the trans- 
lation in the bilingual dictionary may not necessarily 
be understood in its senses (as listed in WordNet). 
The heuristics using these MTDs are aware of this. 

For instance when accessing the semantic fields 
for vin (French) we get a unique translation, wine, 
which has two senses in WordNet: <wine,vino> 
as a beverage, and <wine , wine-coloured> as 
a kind of color. In this example two links 
would be produced {vin, <wine,vino>) and 
{vin, <wine, wine-coloured>) . This link allows 
us to get two possible semantic fields for vin 
(noun, food, file 13, and noun, attribute, file 7) 
and the whole structure of the hierarchy in Word- 
Net for each of the concepts. 



and some attempts to do automatic genus disam- 
biguation have been performed using the semantic 
codes of the dictionary (Bruce et al., 1992) or us- 
ing cooccurrence data extracted from the dictionary 
itself ( IWilks et al., 19931 ). 

Select ing the correct sen se for LDOCE genus 
terms, ( Bruce et al., 1992| )) report a success rate 
of 80% (90% after hand coding of ten genus). This 
impressive rate is achieved using the intrinsic char- 



*In contrast to other sense distinctions Dictionary 
word senses frequently differ in subtle distin ctions (only 



some of which have to do with meaning ( Gale et al 
1993| )) producing a la rge set of closely related dictionary 
senses (Jacobs, 1991). 



^ However, in dictionary definitions the headword and 
the g enus term have to be the same part of speech. 



dWilksetal., 1993| ) disambiguating 197 occurrences 
of the word bank in LDOCE say "was not an easy task, 
as some of the usages of bank did not seem to fit any 
of the definitions very well". Also (Miller et al., 1994) 
tagging semantically SemCor by hand, measure an error 
rate around 10% for polysemous words. 



acteristics of LDOCE. Furthermore, using only the 
imphcit information contained into the dictionary 
definitions of LDOCE ( |Cowie et al., 1992|) report 
a success rate of 47% at a sense level. (Wilks et 



al. , 1993 ) reports a success rate of 45% disambiguat- 



ing the word bank (thirteen senses LDOCE) using a 
technique similar to heuristic 6. In our case, combin- 
ing informed heuristics and without explicit seman- 
tic tags, the success rates are 83% and 82% over- 
all, and 95% and 75% for two-way ambiguous genus 
(DGILE and LPPL data, respectively). Moreover, 
93% and 92% of times the real solution is between 
the first and second proposed solution. 

6 Conclusion and Future Work 

The results show that computer aided construction 
of taxonomies using lexical resources is not limited 
to highly-structured dictionaries as LDOCE, but has 
been succesfuUy achieved with two very different dic- 
tionaries. All the heuristics used are unsupervised, 
in the sense that they do not need hand-codding of 
any kind, and the proposed method can be adapted 
to any dictionary with minimal parameter setting. 

Nevertheless, quality and size of the lexical knowl- 
edge resources are important. As the results for 
LPPL show, small dictionaries with short definitions 
can not profit from raw corpus techniques (heuristics 
5, 6), and consequently the improvement of preci- 
sion over the random baseline or first-sense heuristic 
is lower than in DGILE. 

We have also shown that such a simple technique 
as just summing is a useful way to combine knowl- 
edge from several unsupervised WSD methods, al- 
lowing to raise the performance of each one in isola- 
tion (coverage and/or precision). Furthermore, even 
those heuristics with apparently poor results provide 
knowledge to the final result not provided by the rest 
of heuristics. Thus, adding new heuristics with dif- 
ferent methodologies and different knowledge (e.g. 
from corpora) as they become available will certainly 
improve the results. 

Needless to say, several improvements can be 
done both in individual heuristic and also in the 
method to combine them. For instance, the cooccur- 
rence heuristics have been applied quite indiscrim- 
inately, even in low frequency conditions. Signifi- 
cance tests or association coefficients could be used 
in order to discard low confidence decisions. Also, 
instead of just summing, more clever combinations 
can be tried, such as training classifiers which use 
the heuristics as predictor variables. 

Although we used these techniques for genus dis- 
ambiguation we expect similar results (or even bet- 
ter taken the "one sense per discourse" property 



and lexical knowledge acquired from corpora) for the 
WSD problem. 
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